NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Turn-based Spatiotemporal Coherence for GPUs

https://doi.org/10.1145/3593054

Puthoor, Sooraj; Lipasti, Mikko H. (September 2023, ACM Transactions on Architecture and Code Optimization)

This article introduces turn-based spatiotemporal coherence. Spatiotemporal coherence is a novel coherence implementation that assigns write permission to epochs (or turns) as opposed to a processor core. This paradigm shift in the assignment of write permissions satisfies all conditions of a coherence protocol with virtually no coherence overhead. We discuss the implementation of this coherence mechanism on a baseline GPU. The evaluation shows that spatiotemporal coherence achieves a speedup of 7.13% for workloads with read data reuse across kernels compared to the baseline software-managed GPU coherence implementation while also providing write atomicity and avoiding the need for software inserted acquire-release operations. 1
more » « less
Full Text Available
TNT: A Modular Approach to Traversing Physically Heterogeneous NOCs at Bare-wire Latency

https://doi.org/10.1145/3597611

Ravi, Gokul Subramanian; Krishna, Tushar; Lipasti, Mikko (September 2023, ACM Transactions on Architecture and Code Optimization)

The ideal latency for on-chip network traversal would be the delay incurred from wire traversal alone. Unfortunately, in a realistic modular network, the latency for a packet to traverse the network is significantly higher than this wire delay. The main limiter to achieving lower latency is the modular quantization of network traversal into hops. Beyond this, the physical heterogeneity in real-world systems further complicate the ability to reach ideal wire-only delay. In this work, we propose TNT or Transparent Network Traversal . TNT targets ideal network latency by attempting source to destination network traversal as a single multi-cycle ‘long-hop’, bypassing the quantization effects of intermediate routers via transparent data/information flow. TNT is built in a modular tile-scalable manner via a novel control path performing neighbor-to-neighbor interactions but enabling end-to-end transparent flit traversal. Further, TNT’s fine grained on-the-fly delay tracking allows it to cope with physical NOC heterogeneity across the chip. Analysis on Ligra graph workloads shows that TNT can reduce NOC latency by as much as 43% compared to the state of the art and allows efficiency gains up to 38%. Further, it can achieve more than 3x the benefits of the best/closest alternative research proposal, SMART [ 43 ].
more » « less
Full Text Available
TailWAG: Tail Latency Workload Analysis and Generation

https://doi.org/10.1145/3583060.3583170

Zhuo, Heng; Lipasti, Mikko Herman (February 2023, ACM)

Full Text Available
Work-in-Progress: NoRF: A Case Against Register File Operands in Tightly-Coupled Accelerators

https://doi.org/10.1109/CASES55004.2022.00028

Schlais, David J.; Zhuo, Heng; Lipasti, Mikko H. (October 2022, Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES))

Full Text Available
PrGEMM: A Parallel Reduction SpGEMM Accelerator

https://doi.org/10.1145/3526241.3530387

Chen, Chien-Fu; Lipasti, Mikko (June 2022, GLSVLSI '22: Proceedings of the Great Lakes Symposium on VLSI 2022)

Due to increasing data sparsity in scientific data sets and pruned neural networks, it becomes more challenging to compute with these kinds of sparse data sets efficiently. Several works discuss efficient sparse matrix-vector multiplication (SpMV). However, because of index irregularity in compact stored matrices, sparse matrix-vector multiplication (SpGEMM) still suffers from the trade-off between space and efficiency of computation. In this work, we propose PrGEMM, a multiple reduction scheme which (1) computes SpGEMM under compact storage format without expansion of the operands, (2) by using index lookahead, computes and compares multiple index-data pairs at the same time with no order violation of indices. We evaluate our work with the matrices with different sizes in the SuiteSparse data set. Our work can achieve 3.3x of execution cycle improvement compared to the state-of-the-art SpGEMM scheme.
more » « less
Full Text Available
Systems-on-Chip with Strong Ordering

https://doi.org/10.1145/3428153

Puthoor, Sooraj; Lipasti, Mikko H. (January 2021, ACM Transactions on Architecture and Code Optimization)
null (Ed.)
Sequential consistency (SC) is the most intuitive memory consistency model and the easiest for programmers and hardware designers to reason about. However, the strict memory ordering restrictions imposed by SC make it less attractive from a performance standpoint. Additionally, prior high-performance SC implementations required complex hardware structures to support speculation and recovery. In this article, we introduce the lockstep SC consistency model (LSC), a new memory model based on SC but carefully defined to accommodate the data parallel lockstep execution paradigm of GPUs. We also describe an efficient LSC implementation for an APU system-on-chip (SoC) and show that our implementation performs close to the baseline relaxed model. Evaluation of our implementation shows that the geometric mean performance cost for lockstep SC is just 0.76% for GPU execution and 6.11% for the entire APU SoC compared to a baseline with a weaker memory consistency model. Adoption of LSC in future APU and SoC designs will reduce the burden on programmers trying to write correct parallel programs, while also simplifying the implementation and verification of systems with heterogeneous processing elements and complex memory hierarchies. 1
more » « less
Full Text Available
SHASTA: Synergic HW-SW Architecture for Spatio-temporal Approximation

https://doi.org/10.1145/3412375

Ravi, Gokul Subramanian; Miguel, Joshua San; Lipasti, Mikko (September 2020, ACM Transactions on Architecture and Code Optimization)
null (Ed.)
Full Text Available
BlurNet: Defense by Filtering the Feature Maps

https://doi.org/10.1109/DSN-W50199.2020.00016

Raju, Ravi S.; Lipasti, Mikko (June 2020, 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W))
null (Ed.)
Full Text Available
Modeling Architectural Support for Tightly-Coupled Accelerators

https://doi.org/10.1109/ISPASS48437.2020.00045

Schlais, David J.; Zhuo, Heng; Lipasti, Mikko H. (August 2020, ISPASS Proceedings)
null (Ed.)
Full Text Available
Value Locality Based Approximation With ODIN

https://doi.org/10.1109/LCA.2020.3002542

Singh, Rahul; Ravi, Gokul Subramanian; Lipasti, Mikko; Miguel, Joshua San (July 2020, IEEE Computer Architecture Letters)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records